line-log: support stat and check diff formats with -L#2152
Open
mmontalbo wants to merge 6 commits into
Open
Conversation
9fa8ab9 to
cb71409
Compare
The line-range filter that mm/line-log-cleanup added uses names that
obscure its model. The cursors lno_post/lno_pre and the index lno_0
share an lno_ prefix but conflate the pre/post-image axis with the
0-based/1-based axis, the hunk state is a flat set of rhunk_* fields,
and the filter-state pointer is just s (or f).
The filter bridges two layers of diff.c, and its fields already used
each layer's vocabulary, in cryptic form. Spell them out to the form
the rest of the file uses so the patches that follow simplify and fix
the filter against clearer names:
- lno_post/lno_pre -> lno_in_postimage/lno_in_preimage, the
line-number cursors, matching the counters in struct emit_callback
- lno_0 -> idx_in_postimage, the 0-based range index
- the hunk-header geometry stays old/new (old_begin, new_begin, and
counts) to match the xdiff_emit_hunk_fn callback and the
"@@ -<old> +<new> @@" header it feeds, but moves from flat rhunk_*
fields into a "hunk" sub-struct, so accesses read
filter->hunk.old_begin
- flush_rhunk -> flush_hunk
- the filter-state pointer in each callback: s/f -> filter
Also rename the struct line_range_callback to line_range_filter: it is
a filter over xdiff output, not merely a callback.
No behavior change.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
The filter buffered '-' lines in a pending_rm strbuf, deferring their classification until a '+' or ' ' line revealed the post-image position. That buffering is unnecessary: a removal occupies no post-image line, so it does not advance lno_in_postimage, and xdiff emits removals before additions within a change. A '-' therefore arrives while lno_in_postimage already holds the index the following '+'/' ' will occupy, and can be classified against the ranges as it arrives. The buffering also hid a bug: flush_hunk() drained pending_rm into the range hunk whenever the hunk was active, even after lno_in_postimage had advanced past the tracked range, so a deletion just after the tracked function leaked into the patch. Classifying each line as it arrives removes the pending_rm buffer, the discard_pending_rm() helper, three struct fields, and makes that bug impossible by construction. Document the coordinate model: a block comment on struct line_range_filter states it (the pre/post-image cursors, the 0-based idx_in_postimage, removals classified by the following line) with a worked example, and assert_hunk_counts() cross-checks the accumulated hunk counts against the buffered lines. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
The synthetic hunk header that flush_hunk() builds emitted the begin unadjusted for a side with no lines, naming the line at the change instead of the line before it. In t4211 this produced @@ -25,0 +18,9 @@ whose old side names a nonexistent line 25 in a 24-line file. The unified-diff convention that git diff and xdl_emit_hunk_hdr() follow is to name the line just before the change (-24 here). The convention dates to 12da1d1 (Implement line-history search (git log -L), 2013-03-28), whose hand-rolled diff printer derived headers from range arithmetic without that adjustment. It special-cased only file creation, so a creation matched git diff but an interior insertion did not. 86e986f (line-log: route -L output through the standard diff pipeline, 2026-03-17) reimplemented header emission but preserved this convention, so the off-by-one is inherited rather than new. Blaming a changed fixture line can mislead here: for t4211's vanishes-early that transition turned a "\ No newline at end of file" modification (@@ -22,1 +24,1 @@) into a pure insertion (@@ -23,0 +24,1 @@), so that zero-count header is newly shaped by xdiff even though the rule it follows predates it. Apply the convention in a unified_begin() helper: when a side has no lines, name the preceding line. hunk.*_begin is seeded from xdiff's hunk header, which already made this adjustment for a pure creation or deletion (begin 0), so unified_begin() leaves a 0 alone. Regenerate the two affected t4211 fixtures. Every other hunk, including file creations, is unchanged. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_diff() open-codes the line-range filter setup and teardown around its xdi_diff_outf() call: zero the struct, point it at the output callback, run the diff, flush the trailing range hunk, and release the buffer. The upcoming -L stat and check formats need the same sequence. Extract line_range_filter_init() for the setup and a line_range_filter_diff() helper that runs an initialized filter through xdi_diff_outf(), flushes the final range hunk, and releases it, returning the latched error. builtin_diff() now does init + line_range_filter_diff(); the next two patches reuse them in builtin_diffstat() and builtin_checkdiff() instead of repeating the boilerplate. No behavior change. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
Reuse the line_range_filter in builtin_diffstat() to produce range-scoped statistics. When a filepair carries line_ranges, the filter wraps diffstat_consume() as its output callback, forwarding only in-range lines for counting. flush_hunk() replays buffered content through diffstat_consume(), which ignores synthetic @@ headers since it only counts '+' and '-' lines. Expand the output format allowlist in setup_revisions() to accept --stat, --numstat, and --shortstat with -L. Leave --dirstat out of the allowlist so it is rejected like any other unsupported format. It summarizes change as a per-directory percentage, which is degenerate for the usual single tracked file (always 100% of its directory) and, in its default changed-bytes mode, would measure whole-file damage that ignores the tracked range entirely. --numstat already reports the exact range-scoped per-file counts, so -L gains nothing from --dirstat. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
builtin_checkdiff() runs its own xdiff pass to detect whitespace errors in newly added lines. When -L is active, the check should be scoped to the tracked line ranges rather than the whole file. Reuse the line_range_filter to wrap checkdiff_consume(), the same pattern already used for patch output and diffstat. The filter forwards only in-range lines for whitespace checking. checkdiff reports the file line number of each error, which it normally learns from the hunk header via checkdiff_consume_hunk(). The filter synthesizes its own hunk headers, so give it an optional hunk callback and route checkdiff_consume_hunk() through it; this sets the post-image position before the in-range lines are replayed. Without it the reported line numbers would count from the start of the range hunk rather than the start of the file. Add DIFF_FORMAT_CHECKDIFF to the -L output format allowlist in setup_revisions() so that -L --check is accepted, and list --check among the supported formats in the documentation. Add tests covering that whitespace errors are reported, scoped to the tracked range, and labeled with the correct file line number. Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
cb71409 to
7977925
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This series extends
git log -Lto support the diff stat formatsand --check, with all output scoped to the tracked line ranges.
It builds on top of the mm/line-log-cleanup topic [1], which integrated
-L with the standard log output pipeline and taught it the non-patch
formats --raw, --name-only, --name-status, and --summary.
With these patches the following formats also work with -L and report
range-scoped results:
--stat, --numstat, --shortstat: counts cover only the lines inside
the tracked range, not the whole file.
--check: whitespace errors are reported only for added lines inside
the tracked range, with the correct file line numbers.
The --dirstat format is deliberately rejected. It summarizes a broad
change by reporting each directory's share of the total churn as a
percentage, answering "where is the change concentrated?" That is
degenerate for the handful of line ranges -L tracks: a single tracked
file is always 100% of its directory. Its default mode also measures
whole-file byte damage (via diffcore_count_changes(), outside the
line-based pipeline that -L scopes), so it could not honor the tracked
range even in principle. --numstat already gives the exact
range-scoped per-file counts.
The series leads by clarifying existing logic. The line-range filter
mm/line-log-cleanup added had names that obscured its model (cryptic
lno_ cursors conflating the pre/post-image and 0/1-based axes, a flat
hunk-state struct, and a one-letter state pointer (s/f)), and two of
its incorrect behaviors may have their origin in the confusing model:
a leaked deletion and an off-by-one hunk header. So patch 1 renames
and groups the filter state, and patch 2 documents the model, before
the fixes that read against it:
Patch 1: rename and group the filter for clarity. Spell the
cryptic names out to the file's own forms: the line-number cursors
to lno_in_preimage/lno_in_postimage (as in struct emit_callback)
and the range index to idx_in_postimage, while the hunk geometry
stays old/new (the xdiff_emit_hunk_fn convention) and moves into a
sub-struct. Name the filter pointer (filter) and rename the struct
to line_range_filter. No behavior change.
Patch 2: simplify the filter by classifying removals as they arrive,
dropping the pending_rm buffer and a latent flush_hunk() bug that
leaked deletions just past the range. Document the model it exposes
with a block comment and worked example, and cross-check the hunk
counts against the buffer. (This simplification was submitted by
itself previously [2] but did not advance, so it is re-included
here.)
Patch 3: fix a longstanding off-by-one in the synthetic hunk header
for a side with no lines (a pure insertion or deletion): it named
the line at the change rather than the line before it, diverging
from git diff. The convention dates to the original -L
implementation (12da1d1).
Patch 4: extract a line_range_filter_diff() helper that runs an
initialized filter through xdiff, flushes the final range hunk, and
releases it, and use it in builtin_diff(). The stat and check
patches reuse it.
Patch 5: reuse the filter in builtin_diffstat() for the stat
formats, extend the -L output-format allowlist, and reject --dirstat.
Patch 6: reuse the filter in builtin_checkdiff() and extend the
allowlist for --check.
[1] https://lore.kernel.org/git/pull.2094.v3.git.1780001267.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/pull.2099.git.1777230630020.gitgitgadget@gmail.com/